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Abstract 

Probabilistic  programming  languages  allow  a  mod¬ 
eler  to  build  probabilistic  models  using  complex 
data  structures  with  all  the  power  of  a  program¬ 
ming  language.  We  present  CTPPL,  an  expressive 
probabilistic  programming  language  for  dynamic 
processes  that  models  processes  using  continuous 
time.  Time  is  a  first  class  element  in  our  language; 
the  amount  of  time  taken  by  a  subprocess  can  be 
specified  using  the  full  power  of  the  language.  We 
show  through  examples  that  CTPPL  can  easily  rep¬ 
resent  existing  continuous  time  frameworks  and 
makes  it  easy  to  represent  new  ones.  We  present 
semantics  for  CTPPL  in  terms  of  a  probability  mea¬ 
sure  over  trajectories.  We  present  a  particle  filtering 
algorithm  for  the  language  that  works  for  a  large 
and  useful  class  of  CTPPL  programs. 


1  Introduction 

Probabilistic  programming  languages  (e.g.  IBAL  [Pfeffer, 
2007],  BLOG  [Milch,  2006],  Church  [Goodman  et  al. ,  2008]) 
are  an  exciting  development  in  probabilistic  knowledge  repre¬ 
sentation.  They  allow  a  modeler  to  build  probabilistic  models 
using  complex  data  structures  with  the  full  power  of  program¬ 
ming  languages.  Most  probabilistic  programming  languages 
that  have  been  developed  are  static,  but  ProPL  [Pfeffer,  2005] 
and  D-BLOG  [de  Salvo  Braz  et  al.,  2008]  extend  such  lan¬ 
guages  to  dynamic  models.  However,  they  both  use  discrete 
time. 

There  are  a  number  of  arguments  for  using  continuous  time 
models.  First,  most  real-world  processes  take  place  in  con¬ 
tinuous  time;  discretization  is  an  artificial  imposition  to  force 
them  into  a  discrete  time  framework.  Second,  when  we  model 
a  process  in  discrete  time,  we  have  to  make  the  discrete  time 
increments  fine  enough  to  capture  all  the  important  events, 
but  most  of  the  time  reasoning  at  such  a  fine  time  granularity 
is  unnecessary.  In  the  context  of  probabilistic  programming 
languages,  there  is  a  third  argument.  When  we  model  a  pro¬ 
cess  in  continuous  time,  it  becomes  natural  to  make  the  time 
taken  by  a  subprocess  be  a  variable  which  can  be  specified  us¬ 
ing  the  full  power  of  the  language.  In  short,  time  becomes  a 
first  class  element  of  the  language.  In  contrast,  in  discrete 


time  representations,  usually  the  world  proceeds  from  one 
time  step  to  the  next,  and  the  time  between  them  is  fixed. 

In  this  paper  we  present  CTPPL  (pronouced  “Cat  Peo¬ 
ple”),  a  continuous  time  probabilistic  programming  language. 
CTPPL  is  similar  to  ProPL,  but  is  different  in  four  main  ways: 
(1)  values  in  CTPPL  are  discrete  or  continuous;  (2)  time  is 
continuous;  (3)  probability  and  time  are  first  class  elements 
of  the  language;  and  (4)  inference  is  conducted  by  sampling 
trajectories  through  a  state  space,  rather  than  constructing  a 
giant  dynamic  Bayesian  network.  We  first  present  the  syntax 
of  the  language  and  some  examples.  When  combining  the 
expressive  power  of  probabilistic  programming  with  continu¬ 
ous  time,  both  semantics  and  inference  become  challenging. 
In  Section  4  we  present  the  semantics  of  a  CTPPL  program 
as  defining  a  probability  measure  over  a  space  of  equivalence 
classes  of  trajectories  through  state  space,  where  the  trajec¬ 
tories  are  partly  discrete  and  partly  continuous.  In  Section  5 
we  present  a  particle  filtering  algorithm  that  works  for  a  large 
and  useful  class  of  CTPPL  programs.  At  the  core  of  our  al¬ 
gorithm  is  an  importance  sampling  algorithm  that  guarantees 
that  if  the  set  of  particles  at  one  time  point  is  consistent  with 
the  next  set  of  observations  that  are  received,  then  with  pos¬ 
itive  probability  the  algorithm  will  generate  a  new  particle 
with  positive  weight. 


2  The  Language 


The  basic  unit  of  the  CTPPL  language  is  the  expression, 
which  describes  a  computation  that  executes  stochastically 
in  time  while  producing  a  value.  While  it  is  executing,  it 
may  produce  emissions,  which  are  observations  that  happen 
at  particular  points  in  time.  Expressions  can  be  any  one  of  the 
following  forms: 


c 

n 

if  ei  then  e 2  else  e 3 

lambda  n.e 

eo(ei) 

e\  ®  e2 

{ei,e2} 

left  e 

right  e 

f  lip  e 

9(e) 

first  [ei,  e2] 
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Variable 

Conditional 

Function  definition 

Application 

Binary  operator 

Pair  construction 

Component  extraction 

Component  extraction 

Discrete  probabilistic  choice 

Continuous  probabilistic  choice 

First  expression 
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emitei;e2  Emission 

delay  ei;e2  Delay  expression 
now  Now  expression 

Constants  can  be  boolean,  string,  integer  or  floating  point. 
While  variables,  conditionals,  function  definition,  applica¬ 
tion,  pair  construction  and  component  extraction  are  familiar, 
some  explanation  is  needed  of  how  they  specify  an  execu¬ 
tion  in  time.  In  a  conditional  if  e\  then  e2  else  e 3,  the 
test  ei  is  first  evaluated.  After  it  has  completed,  either  e2 
or  e3  is  evaluated,  depending  on  the  result  of  e\.  In  a  func¬ 
tion  application  eo(ei),  eo  and  e1  are  evaluated  in  parallel  to 
produce  values  to  and  V\ .  After  they  have  both  completed, 
the  body  of  vq  is  evaluated  with  its  formal  argument  bound 
to  vi.  For  binary  operators,  the  two  arguments  are  evaluated 
in  parallel.  Once  they  have  terminated  the  result  is  produced 
immediately.  In  a  pair  construction  {ei,e2},  e±  and  (■■2.  are 
evaluated  in  parallel.  The  time  of  completion  of  the  entire 
pair  construction  is  the  later  of  the  completion  times  of  e\ 
and  e2. 

Discrete  probabilistic  choice  is  familiar  from  other  prob¬ 
abilistic  programming  languages  such  as  IBAL  and  Church: 
flip  e  means  evaluate  e  to  produce  a  non-negative  floating 
point  value  v,  and  then  with  probability  v  return  T,  otherwise 
return  F.  Note  that  unlike  IBAL  and  ProPL,  but  like  Church, 
the  probability  is  defined  by  an  expression,  so  is  a  first  class 
element  of  the  language.  Continuous  probabilistic  choice  is 
new  to  CTPPL.  Here  q  is  a  probabilistic  primitive  that,  given  a 
parameter,  defines  a  probability  density  function.  The  mean¬ 
ing  of  q(e)  is  to  first  evaluate  e  to  produce  a  value  v  for  the 
parameter,  and  then  generate  a  value  from  the  density  func¬ 
tion  defined  by  q  given  the  parameter  value  v.  The  language 
is  fully  general  in  allowing  any  primitive,  as  long  as  (1)  we 
can  generate  samples  from  it,  and  (2),  we  can  compute  its 
density  at  any  point. 

The  meaning  of  first  [ei,e2]  is  to  begin  evaluating  e± 
and  62  in  parallel.  As  soon  as  either  of  them  completes  with 
a  value  v,  the  entire  first  expression  completes  with  value  v. 
Ties  are  broken  in  favor  of  ei.  The  meaning  of  emit  e\  \  e-2 
is  to  evaluate  e\  to  produce  a  value  v,  completing  at  time  t.  v 
is  then  emitted  at  time  t.  Finally  e2  is  evaluated,  beginning  at 
time  t,  to  produce  the  result  of  the  expression. 

Passage  of  time  is  specified  through  delay  expressions.  The 
meaning  of  the  expression  delay  e\\  e-i  is  to  first  evaluate  e± 
to  produce  positive  floating  point  value  v,  terminating  at  time 
t.  Then  e 2  is  evaluated,  beginning  at  time  t  +  v.  In  other 
words,  there  is  a  delay  of  v  between  the  time  e\  ended  and 
the  time  e2  begins.  The  expression  e\  is  called  the  delaying 
subexpression  of  the  delay  expression.  While  delay  expres¬ 
sions  are  present  in  ProPL,  they  are  different  in  CTPPL.  The 
two  differences  are  that  the  delay  time  is  continuous,  and  the 
delay  time  is  specified  by  an  arbitrary  expression  (in  ProPL, 
it  is  specified  by  an  integer  constant).  Thus  the  delay  is  a  first 
class  element  of  the  language.  The  final  construct,  which  fur¬ 
ther  makes  time  a  first  class  element  of  the  language,  is  now, 
which  always  returns  immediately,  evaluating  to  the  current 
time  at  the  time  of  its  evaluation. 

In  addition  to  the  above  bare-bones  language,  CTPPL  pro¬ 
vides  a  good  deal  of  syntactic  sugar.  One  of  these  is  the 
expression  dist  [ef  :  e^,  ...,e£  :  e£],  which  is  a  general¬ 


ization  of  discrete  probabilistic  choice.  The  meaning  is  to 
evaluate  e\, in  parallel  to  produce  a  set  of  non-negative 
floating  point  values,  normalize  the  values  to  obtain  proba¬ 
bilities  pi, ... ,pn ,  choose  one  of  the  e\  with  probability  p>, 
and  then  evaluate  e\  to  produce  the  result.  Others  include 
let  expressions,  in  which  a  variable  is  bound  to  a  value  to  be 
used  in  a  result  expression;  case  expressions,  analogous  to 
switch  expressions  in  C;  functions  of  many  arguments;  tuples 
of  many  components;  first  expressions  with  many  subexpres¬ 
sions;  records  with  named  fields;  and  lists  with  supporting 
functions.  We  will  use  syntactic  sugar  freely  in  the  examples. 
However,  we  will  restrict  attention  to  the  core  language  when 
defining  the  semantics  and  inference  algorithm. 

We  impose  the  following  restrictions  on  models:  (1) 
In  an  emission,  the  emitted  value  cannot  be  float¬ 
ing  point.  (2)  In  a  delay  expression,  the  delaying 
subexpression  must  be  continuously  distributed  at  the 
time  of  evaluation,  with  no  points  of  positive  prob¬ 
ability  mass;  for  example,  the  delaying  subexpression 
in  let  x  =  uniform  (1.0)  in  delay  x;  1  is  not 
continuously  distributed,  because  at  the  time  of  evaluation  x 
is  bound  to  a  value.  (3)  Emissions  are  not  allowed  to  be  pro¬ 
duced  while  executing  a  delaying  subexpression.  (4)  Delays 
must  be  positive.  (5)  Floating  point  values  cannot  be  com¬ 
pared  for  equality,  only  inequality.  (6)  Floating  point  arith¬ 
metic  binary  operators  are  required  to  be  invertible.  We  also 
assume  that  at  any  time  point,  with  probability  1  the  process 
will  either  progress  to  another  time  point  or  terminate.  Even 
with  these  restrictions,  the  language  allows  many  useful  and 
interesting  processes  to  be  represented,  and  the  expressive 
power  goes  far  beyond  that  of  any  existing  language. 

3  Examples 

In  recent  years  there  has  been  a  flurry  of  interest  in  continuous 
time  models,  mostly  focused  on  continuous  time  Bayesian 
networks  (CTBNs)  [Nodelman,  2007],  CTBNs  are  built  on 
homogenous  Markov  processes.  A  homogenous  Markov  pro¬ 
cess  is  a  finite  state,  continuous  time  process,  consisting  of  an 
initial  distribution  P°  and  intensity  matrix 
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The  meaning  is  that  if  the  process  in  state  i,  it  stays  in  state 
one  for  an  amount  of  time  governed  by  an  exponential  distri¬ 
bution  with  parameter  q,  ,  then  transitions  to  qt  with  probabil¬ 
ity  q,:j  / q, .  We  can  write  this  in  CTPPL  using 

x  ( )  =  dist  [Pj3  :  xl(),  P°  :  xn  ( )  ] 

xl()  =  delay  exp  (<71); 

dist  [(712/91  x2(),  .  .  . ,  qin/di  ■  xn  ( )  ] 

Some  state  transitions  can  produce  emissions. 

In  a  CTBN,  each  variable  has  a  conditional  inten¬ 
sity  matrix  Q"  for  every  combination  of  values  u  of 
its  parents.  To  capture  CTBNs,  we  use  three  helper 
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functions  that  are  easy  to  write  and  are  not  shown. 
change_ith ( list ,  i,  value)  changes  the  i-th  el¬ 
ement  of  list  to  value,  keeping  the  rest  as  in  list, 
extract-indices  ( list,  indices)  returns  a  tuple 
consisting  of  the  elements  at  the  given  indices  of  the  input 
list,  f  irst_list_i  ( f ,  list )  is  a  higher  order  function 
that  takes  a  function  f,  applies  it  to  each  member  of  list, 
and  returns  the  first  produced  result  (like  a  first)  expres¬ 
sion.  The  function  f  takes  two  arguments,  the  element  and 
the  index  of  the  element  in  list.  In  a  CTPPL  model  of  a 
CTBN,  each  variable  has  a  process  of  the  form: 

xO (previous)  = 

case  {previous ,xl, extract.indices ([2,4])} 
of  {l,u}  -> 
delay  exp  (g“)  ; 

dist  [912/91  :  2,...,  q\Jc£C-  n  ] 

Here  [2,4]  are  the  indices  of  the  parents  of  xO.  Again,  some 
transitions  can  produce  emissions.  A  list  of  variable  pro¬ 
cesses  is  passed  to  the  ctbn  function,  which  works  with  ar¬ 
bitrarily  many  variables.  The  init  ()  function  is  a  specifi¬ 
cation  of  a  Bayesian  network  defining  the  initial  state. 

ctbn (variables )  = 
let  process (previous )  = 
process 
( f  irst_list_i 
(lambda  (v,i)  . 

change_ith  (previous,  i,  v  (previous )))  , 
variables ) 
in  process (init ()  ) 

Some  authors  [Gopalratnam  et  al. ,  2005;  Nodelman  el  al., 
2005]  have  studied  how  to  extend  CTBNs  to  allow  Erlang- 
Coxian  and  phase  distributions  for  the  delays.  This  can  easily 
be  done  in  CTPPL  since  delaying  subexpressions  can  be  ar¬ 
bitrary  expressions. 

CTPPL  goes  well  beyond  CTBNs  in  two  important  ways. 
The  first  is  that  we  can  define  processes  over  arbitrarily  com¬ 
plex  data  structures.  The  second  is  that  time  is  a  first  class 
element  of  the  language.  So,  for  example,  we  could  have  de¬ 
pendent  delays,  as  in 

let  x  =  uniform(2.0)  in 
{delay  exp(x)  in  ''a'', 
delay  exp(x+l)  in  ''b''} 

or  we  could  have  a  delay  dependent  on  the  value  produced  by 
the  delay  expression,  as  in 

let  x  =  uniform(2.0)  in 
delay  exp(x)  in  x 

To  illustrate  the  power  of  these  two  ideas  in  combination, 
we  develop  a  model  of  musical  performance.  We  begin 
with  a  simple  model  of  monophonic  performance,  follow¬ 
ing  [Cemgil  and  Kappen,  2003],  The  play  function  takes 
as  arguments  the  current  tempo,  and  the  input,  which  is  a  list 
of  {time,  pitch}  pairs.  The  time  element  is  the  musical  dura¬ 
tion  before  which  the  pitch  should  be  played.  In  the  following 


code,  posnormal  denotes  a  truncated  normal  distribution, 
with  the  given  mean  and  variance,  constrained  to  be  positive. 
Note  that  both  the  delay  and  the  variance  of  the  new  tempo 
depend  on  the  duration. 

play (tempo, input )  = 
if  empty (input) 
then  true  //  termination 
else 

let  duration  =  left  head (input)  in 
delay  posnormal ({duration  *  tempo, 
0.01}); 

emit  right  head ( input); 
let  newrtempo  = 
posnormal ({tempo, 

0.04  *  duration  *  duration})  in 
play  (new_tempo,  tail (input)) 

In  CTPPL  it  is  easy  to  generalize  this  model  to  ar¬ 
bitrarily  many  voices.  With  multiple  voices  the  per¬ 
formed  pitches  of  the  voices  are  interleaved.  The  follow¬ 
ing  code  uses  the  standard  higher-order  function  map_i, 
which  is  similar  to  first_list_i  above  except  that 
it  maps  a  two-argument  function  over  all  elements  of 
a  list.  filter  (non_empty,  new_input)  returns 
new_input  with  all  empty  lists  removed.  The  idea  behind 
the  code  is  that  after  each  emission,  it  attempts,  in  parallel, 
to  process  each  voice,  generating  the  time  of  the  next  emis¬ 
sion  for  that  voice,  and  the  transformed  input  after  the  voice 
has  played  its  next  note.  In  this  input  transformation,  pro¬ 
duced  by  mapping  the  helper  function  over  the  old  input, 
the  voice  itself  is  replaced  with  its  tail,  because  its  note  has 
been  played,  whereas  for  all  the  other  voices  the  duration  un¬ 
til  the  played  note  is  subtracted  from  the  time  to  the  first  note. 
Now,  all  this  is  done  in  parallel  for  each  of  the  voices,  but 
only  the  first  to  produce  its  emission  is  actually  used.  In  this 
way,  the  emissions  of  all  the  voices  are  interleaved. 

play (tempo,  input)  = 
if  empty (input)  then  true  else 
let  process (l,i)  = 
let  duration  =  left  head(l)  in 
delay  posnormal ({duration  *  tempo, 

0.01})  ; 

emit  right  head(l); 
let  helper (m, j)  = 
if  j  ==  i 
then  tail  m 

else  {left  head(m)  -  duration, 
right  head(m)}  ::  tail (m)  } 
in 

let  new_input  =  map_i  (helper,  input)  in 
let  newrtempo  = 
posnormal ({tempo, 

0.04  *  duration  *  duration})  in 
{new.tempo,  filter  (non_empty,  new_input)} 
in 

let  {  new.tempo,  new_input  }  = 
first_list_i  (process,  input)  in 
play (newjtempo,  new_input) 
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4  Semantics 


The  probabilistic  transition  rules  are: 


The  semantics  of  a  CTPPL  program  is  a  probability  measure 
over  a  set  of  equivalence  classes  of  trajectories  through  state 
space.  A  state  consists  of  a  CTPPL  expression,  a  time  stamp, 
and  a  set  of  marks  on  subexpressions  of  the  expression.  In¬ 
tuitively  a  state  is  a  point  in  the  execution  of  a  process.  The 
expression  defines  the  process  currently  being  executed.  The 
time  stamp  is  the  current  time.  The  marks  indicate  which 
subexpressions  of  the  expression  are  being  processed.  Since 
execution  can  occur  in  parallel,  there  may  be  multiple  marks. 
Our  notation  for  a  state  is  a  pair  of  an  expression  and  a  time 
stamp,  with  a  bullet  in  front  of  each  of  the  marked  subexpres¬ 
sions.  For  example  (if  •  e\  then  e2  else  63, 3)  is  the 
state  consisting  of  the  if  expression,  with  time  stamp  3,  with 
a  mark  on  e\. 

States  can  transition  to  other  states.  There  are  three  kinds 
of  transitions:  free  transitions,  which  do  not  involve  probabil¬ 
ities  or  time  delays;  probabilistic  transitions,  which  involve 
probabilistic  choice  but  no  time  delays;  and  temporal  tran¬ 
sitions,  in  which  the  time  stamp  of  the  state  changes.  Free 
transitions  are  expressed  using  rewrite  rules.  Since  the  time 
stamp  does  not  change,  we  omit  the  time  stamp  part  of  the 
state  in  the  following  rules.  In  the  following  transition  rules,  v 
denotes  an  expression  which  directly  specifies  a  value,  which 
could  be  a  constant,  a  lambda  expression,  or  a  pair  of  val¬ 
ues.  Also  the  notation  w  denotes  an  expression  that  does  not 
directly  specify  a  value.  The  notation  e[x/v\  denotes  the  ex¬ 
pression  produced  from  replacing  all  unshadowed  occurences 
of  x  in  e  with  v.  The  free  transition  rules  are: 


•  if  ei  then  e 2  else  e3 
if  •  T  then  e 2  else  e3 
if  •  F  then  e2  else  e3 
•(eo(ei)) 

•  (lambda  n.e)(* v) 

•(ei  ®  e2) 

(•fi)  ©  (*u2) 

•{ei,e2} 

{•Vl,*U2} 

•  left  e 
left  •  v 

•  right  e 
right  • v 

•  flip  e 
• q(e ) 

•  first  [ei, e2] 
first  [•v1,»e2] 
first  [•Mti,  •i>2] 
delay  •  0;  e 

•  emit  ei ;  e2 
emit  •  iq;  e2 


if  •  ei  then  e2  else  e3 

•e2 

•e3 

(•e0)(*ei) 

•e[n/v\ 

(•ei)  ©  (*e2) 

•  (ui  ©  v2) 

{•ei,*e2} 

•{th,f2} 

left  • e 

•x  where  v  =  { x ,  y} 
right  • e 
•y  where  v  =  { x ,  y} 
flip  • e 
g(*e) 

first  [•ei,*e2] 

•Vl 

•v2 

•e 

emit  •  ep,  e2 

•e2,  while  emitting  v\ 


The  expression  now  also  produces  a  free  transition,  but  it  de¬ 
pends  on  the  time.  The  rule  is  (*now,  t)  — >  (*f,  t). 

Probabilistic  transitions  specify  a  probability  distribution 
or  probability  density  function  over  next  states.  When  exe¬ 
cuting  a  probabilistic  transition,  the  process  transitions  to  an¬ 
other  state  with  the  given  probability  or  probability  density. 


J  T  with  probability  v 
1P  *  v  (  F  with  probability  1  —  v 

q(»v)  x  with  density  q(v)  (x) 

Temporal  transitions  are  continuous.  A  temporal  transition 
begins  in  a  time-terminal  state,  which  is  a  state  in  which  no 
free  or  probabilistic  transitions  are  possible.  For  example, 

(delay  (delay  •  2.0;  exp(3.0)); 

if  flip  0.5  then  ''a''  else  '  'b' ' , 

7.2) 

is  a  time-terminal  state.  Note  that  when  there  are  nested 
delays,  not  all  the  delaying  subexpressions  in  a  time- 
terminal  state  need  specify  a  value.  For  example,  in 
delay  (delay  2.0;  unif  orm  ( 3 . 0 )  )  ;  1,  only  the 
inner  delay’s  delaying  subexpression  specifies  a  value.  If  a 
time-terminal  state  contains  no  delaying  subexpressions,  no 
temporal  transition  is  possible.  Otherwise,  define  the  mini¬ 
mum  delay  time  of  a  time-terminal  state  to  be  the  minimum 
of  the  values  specified  by  its  value  specifying  delaying  subex¬ 
pressions.  Let  (e,  t)  be  a  time-terminal  state  with  minimum 
delay  time  5.  The  process  transitions  through  a  continuum  of 
states  ( eu,u )  for  u  £  [t,t  +  5],  where  eu  is  the  same  as  e 
except  that  every  value  specifying  delaying  subexpression  v 
is  replaced  with  v—(u  —  t).  At  time  t  +  S,  one  of  the  delaying 
subexpressions  will  be  0,  so  a  free  transition  is  available  and 
the  process  can  continue. 

No  transitions  are  allowed  on  expressions  that  specify  a 
value.  The  rules  for  applying  the  transitions  are  as  follows: 

1.  Let  i  =  0,  and  let  to  =  0. 

2.  Execute  free  and  probabilistic  transitions  at  ti  that  are 
not  applied  to  delaying  subexpressions  until  no  more 
are  available.  Transitions  are  performed  in  a  depth-first 
manner,  but  the  order  in  which  parallel  subexpressions 
are  resolved  is  arbitrary,  except  that  first  subexpres¬ 
sions  must  be  resolved  in  the  order  in  which  they  appear. 

3.  If  there  are  no  delaying  subexpressions,  the  expression 
must  specify  a  value,  and  the  process  terminates. 

4.  Otherwise,  let  the  state  be  called  the  pre-delay- 
resolution  state  (PDRS)  at  time  ti. 

5.  At  this  point,  the  delaying  subexpressions  are  resolved 
one  by  one  using  free  and  probabilistic  transitions.  If 
there  are  no  nested  delays,  a  delaying  subexpression  will 
resolve  to  specify  a  value.  If  there  are  nested  delays,  the 
outer  delaying  subexpressions  will  not  resolve  to  a  value, 
but  eventually  some  of  the  nested  delays  will. 

6.  Finally,  we  reach  a  time  terminal  state  at  £,  and  execute 
a  temporal  transition  through  [f,,  ii+1],  We  set  i  <—  i  + 1 
and  continue  with  step  2. 

A  trajectory  consists  of  a  discrete  sequence  of  states  at  time 
to  =  0,  followed  by  a  continuum  of  states  up  to  time  fi, 
followed  by  another  discrete  sequence  of  states  at  time  ti, 
followed  by  another  continuum  of  states  up  to  time  t2,  and 
so  on.  A  trajectory  also  determines  a  sequence  of  emissions, 
defined  as  a  list  [f0  :  toq,  t\  :  mi, ...],  where  nil  is  the  set  of 
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emissions  that  happen  at  time  ti.  The  trajectory  and  emission 
sequence  may  or  may  not  terminate. 

We  define  an  equivalence  relation  on  trajectories,  saying 
that  two  trajectories  are  equivalent  if  they  contain  the  same 
transitions,  except  that  parallel  subexpressions  may  be  re¬ 
solved  in  different  orders.  Equivalent  trajectories  have  the 
same  PDRSs,  time-terminal  states,  temporal  transitions  and 
emission  sequence.  The  rules  specified  above  define  a  prob¬ 
ability  measure  over  equivalence  classes  of  trajectories.  The 
elementary  sets  consist  of  sets  of  equivalence  classes  contain¬ 
ing  the  same  free  transitions,  the  same  discrete  probabilistic 
transitions,  and  in  which  the  continuous  probabilistic  transi¬ 
tions  fall  in  an  interval.  The  density  of  an  equivalence  class 
is  the  product  of  the  probabilities  of  its  discrete  probabilis¬ 
tic  transitions  and  the  densities  of  its  continuous  probabilistic 
transitions.  Because  all  members  of  an  equivalence  class  have 
the  same  probabilistic  transitions,  the  probability  of  an  equiv¬ 
alence  class  is  equal  to  the  product  of  the  probabilities  and 
densities  of  the  probabilistic  transitions  in  any  member.  In 
the  next  section,  we  will  use  the  term  path  to  describe  a  con¬ 
tiguous  sub-trajectory  of  a  trajectory,  and  we  will  use  the  term 
“probability  of  a  path”  to  denote  the  product  of  the  probabil¬ 
ities  and  densities  of  the  probabilistic  transitions  in  the  path. 

This  semantics  is  well  able  to  handle  cases  where  a  poten¬ 
tially  unbounded  number  of  processes  happen  in  parallel.  For 
example,  consider  the  program: 

let  f (x)  = 
delay  exp  ( 1 ) ; 

first  [delay  exp  (2);  x,  f(x+l)] 
in  f ( 0 ) 

We  will  first  expand  f  ( 0 )  to  transition  to  the  PDRS 

delay  •  exp  ( 1 ) ; 

first  [delay  exp (2);  0,  f  ( 1 ) ] 

We  will  resolve  the  delaying  subexpression  exp(l), 
choosing  a  value,  say  0.7.  We  will  then  take  a 
temporal  transition  to  time  0.7,  and  then  transition  to 
•  first  [delay  exp  (2);  0,  f  ( 1 )].  Next,  we  will 
move  the  mark  inside  the  first  expression,  and  expand  f  ( 1 ) 
to  transition  to  the  PDRS 

first  [delay  •  exp(2);0, 
delay  •  exp  ( 1 ) ; 

first  [delay  exp(2);  1,  f ( 2 ) ] ] 

Let’s  say  we  sample  1.3  and  0.5  for  the  two  delaying  subex¬ 
pressions.  We  will  take  a  temporal  transition  to  time  1.2  and 
the  expression 

first  [delay  •  0.8;  0, 

•  first  [delay  exp(2);  1,  f ( 2 ) ] ] 

Now  we  will  move  the  second  mark  inward  and  expand  f  ( 2 ) 
to  reach  another  PDRS 

first  [delay  •  0.8;  0, 

first  [delay  •  exp(2);  1, 
delay  •  exp ( 1 ) ; 

first  [delay  exp(2);  2,  f ( 3 )  ]  ]  ] 


Now  we  will  resolve  the  latter  two  delaying  subexpressions. 
Let’s  say  we  get  0.9  and  1.2.  We  will  take  a  temporal  transi¬ 
tion  to  time  2.0  and  transition  to 

first  [•  0, 

first  [delay  •  0.1;  1, 
delay  •  0.4; 

first  [delay  exp (2);  2,  f ( 3 ) ] ] ] 

By  the  rules  for  first  expressions,  this  resolves  to  0.  Even 
though  the  number  of  parallel  processes  is  unbounded,  in  any 
given  run  the  process  will  terminate  in  a  finite  amount  of  time. 
Each  such  finite  run  has  a  well-defined  probability  of  happen¬ 
ing. 

By  contrast,  consider  the  program 
let  f (x)  = 

first  [  delay  exp(0);  x,  f(x+l)] 
in  f ( 0 ) 

This  is  a  divergent  program.  Infinitely  many  parallel  branches 
will  be  expanded  before  any  delaying  subexpression  is  re¬ 
solved.  Our  semantics  does  not  handle  such  a  program,  and 
it  is  disallowed. 

5  Inference 

Our  goal  is  to  monitor  the  state  of  the  system  over  time  based 
on  observations.  Our  observations  consist  of  a  stream  of 
emissions.  At  each  time  point  i  =  0,1, ....  we  want  to  es¬ 
timate  the  probability  distribution  over  the  PDRS  at  time  tt 
given  the  sequence  of  emissions  up  to  time  We  choose  the 
PDRS  because  of  two  properties:  (i)  no  more  emissions  are 
possible  at  t,  after  the  PDRS  has  been  reached  (because  of  our 
restriction  that  executing  a  delaying  subexpression  is  not  al¬ 
lowed  to  produce  emissions);  and  (ii),  no  delaying  subexpres¬ 
sions  have  been  resolved,  so  stopping  at  the  PDRS  is  making 
a  minimal  commitment  about  what  time  ti+ 1  will  be.  For¬ 
mally,  we  want  to  estimate  P(S(ti)\mo,  ■  mi),  where  S(ti) 
denotes  the  PDRS  at  time 

We  will  estimate  the  distribution  at  time  f,:  using  a  set  of 
particles.  As  usual,  we  proceed  recursively,  beginning  with 
an  initial  estimate  of  P(5(0)|mo),  and  then  recursively  esti¬ 
mating  P(S(ti)\mo,  by  repeatedly  choosing  a  parti¬ 

cle  Sj—  i  from  P(S(ti_ i)),  and  sampling  a  particle  s,-  from 
P{S{ti)\S{ti-i)  =  Si-i, mi). 

One  might  think  that  the  simplest  way  to  do  this  is  to  use 
rejection  sampling.  We  can  define  a  sampling  process  using 
the  transition  rules  from  Section  4.  We  begin  by  choosing  a 
PDRS  Si-i  at  random  from  our  previous  set  of  particles  at 
time  ti- 1-  We  then  resolve  the  delaying  subexpressions  until 
we  reach  a  time-terminal  state  with  minimum  delay  time  5. 
Let  t'  be  i  +  S.  We  then  reason  as  follows: 

1.  If  t'  <  ti ,  we  complete  the  temporal  transition  to  time 
t' .  We  then  execute  some  free  and  probabilistic  transi¬ 
tions  until  we  reach  a  PDRS  at  time  t' .  If  in  the  process 
an  emission  is  produced,  we  reject  the  sample  because 
according  to  our  observations  there  are  no  emissions  be¬ 
tween  ti- 1  and  ti.  Otherwise,  we  continue  the  process 
beginning  with  the  new  PDRS. 
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2.  If  if  =  ti ,  we  complete  the  temporal  transition  to  time 
ti,  perform  some  more  free  and  probabilistic  transitions, 
and  finally  arrive  at  a  PDRS  st  at  time  /,t.  We  check  to 
see  if  the  emissions  ml  have  been  produced.  If  so,  we 
accept  the  sample  and  add  to  the  set  of  samples  at  time 
ti.  If  not,  we  reject  the  sample. 

3.  If  tf  >  ti ,  we  can  immediately  reject  the  sample.  This 
is  because  no  emissions  are  possible  in  the  middle  of  a 
temporal  transition. 

The  problem  with  this  approach  is  that  we  can  only  ac¬ 
cept  a  sample  if  t!  =  ti.  Since  delay  times  are  continuously 
distributed,  this  has  probability  zero  of  happening.  Essen¬ 
tially  the  problem  is  that  rejection  sampling  cannot  be  used 
when  conditioning  on  events  of  measure  zero.  Instead,  we 
use  a  particle  filtering  approach.  Ng  et  al.  [Ng  el  al.,  2005] 
present  a  particle  filtering  method  for  continuous  time,  hy¬ 
brid  state  processes.  Fan  and  Shelton  [Fan  and  Shelton,  2008] 
present  importance  sampling  and  particle  filtering  algorithms 
for  CTBNs  which  form  the  basis  for  our  approach.  As  in 
standard  particle  filtering,  we  begin  with  an  unweighted  set 
of  samples  at  time  f,_i.  We  choose  a  sample  Sj_i  at  ran¬ 
dom  from  this  set  and  use  importance  sampling  to  gener¬ 
ate  a  weighted  particle  ■sl  at  time  ti.  We  then  resample  the 
weighted  particles  to  produce  a  set  of  unweighted  particles 
which  is  our  estimate  P(S(ti)\mo, ...,  mi). 

5.1  Importance  Sampling 

The  importance  sampling  algorithm  we  use  is  actually  an 
importance/rejection  algorithm,  since  rejections  are  allowed. 
However,  we  will  guarantee  that  if  m,  has  positive  proba¬ 
bility  given  our  estimate  of  P(Sj_i|mo, ...,  then  with 

positive  probability  a  new  sample  s,  will  be  generated  with 
positive  weight.  The  main  idea  of  our  algorithm  is  that  some 
delays  are  key  delays.  Suppose  we  begin  with  a  PDRS  (e,  t). 
We  proceed  to  resolve  the  delaying  subexpressions.  At  the 
time  we  process  the  last  delaying  subexpression  e',  let  the 
minimum  delay  time  of  all  the  delaying  subexpressions  that 
have  already  been  resolved  to  specify  a  value  be  S,  and  let 
tf  =  t  +  S.  If  t'  <  ti,  then  we  know  at  the  time  we  process  e' 
that  the  temporal  transition  will  take  us  to  a  time  at  or  before 
t' ,  which  is  no  later  than  f,.  So  we  need  place  no  constraints 
on  the  resolution  of  e' .  If,  on  the  other  hand,  tf  >  ti,  we  say 
that  ef  is  a  key  delay.  We  know  that  if  e'  resolves  to  a  value 
S'  >  ti  —  tf ,  we  will  be  forced  to  reject  the  sample.  There¬ 
fore,  when  we  encounter  a  key  delay  ef  we  use  the  following 
procedure. 

1.  Sample  a  resolution  of  e! . 

2.  If  resolving  el  results  in  one  or  more  nested  delays,  let 
e"  be  the  delaying  subexpression  in  ef  that  is  the  last  to 
be  processed.  We  can  reason  about  ef'  in  the  same  way 
as  ef  and  continue  the  sampling  process. 

3.  Otherwise,  resolving  el  produces  a  value  S'.  Let  t"  = 
t  +  6'. 

4.  If  t"  <  ti,  complete  the  temporal  transition  to  t"  and  ex¬ 
ecute  free  and  probabilistic  transitions  to  reach  a  PDRS 
s".  If  emissions  are  produced,  reject.  Otherwise,  con¬ 
tinue  sampling  from  s". 


5.  Iff"  ti,  force  t  to  be  cc|  li  a  I  to  ti,  compensating  with 
an  importance  weight  w.  Complete  the  temporal  transi¬ 
tion  to  ti  and  execute  free  and  probabilistic  transitions 
to  reach  a  PDRS  st.  If  the  emissions  m,  are  produced, 
add  Si  with  weight  w  to  the  set  of  weighted  particles  at 
ti.  Otherwise  reject  the  sample. 

There  is  a  subtlety  related  to  parallel  execution.  If  we  al¬ 
ways  resolve  the  delaying  subexpressions  in  the  same  order, 
some  subexpressions  will  never  be  key  delays.  If  the  observed 
emissions  are  the  result  of  those  delays  completing,  we  will 
never  get  an  accepted  sample.  In  technical  terms,  the  proposal 
distribution  will  not  be  positive  everywhere  the  true  distribu¬ 
tion  is.  To  get  around  this  problem,  we  randomly  choose  the 
order  in  which  delaying  subexpressions  are  resolved. 

To  derive  the  importance  weight,  we  reason  as  follows:  Let 
s  be  the  state  just  before  the  key  delay  is  resolved.  Let  s'  be 
the  state  in  which  the  key  delay  is  resolved  to  specify  the 
value  6'  =  ti  —  t.  Let  s"  be  the  state  with  time  stamp  tt  at 
the  end  of  the  temporal  transition  from  s'.  The  probability 
density  of  a  path  from  Sj_  i  through  s,  s'  and  s"  to  st  can  be 
written  as  p(s,:_i  s)p(s  s')p(s”  Si ),  where  si  s2 
denotes  the  event  the  si  leads  to  s2  through  some  sequence 
of  transitions,  because  p(s'  s'')  is  1.  Now  consider  our 
proposal  distribution.  We  sample  the  path  from  Sj_  ±  to  s  nor¬ 
mally,  and  similarly  the  path  from  s"  to  s,.  However,  we 
force  s  to  lead  to  s'.  This  happens  with  probability  P((f>), 
where  6  is  the  event  that  resolving  s  does  not  lead  to  nested 
delays  and  produces  a  value  5  >  ti  —  t.  Thus  the  proposal 
probability  density  of  the  path  is  p(sj_i  s)P(cj))p(s"  ^ 
si).  Therefore  the  importance  weight,  which  is  the  true  den¬ 
sity  divided  by  the  proposal  density,  is  p(s  s')/P((f>). 
P{(j>)  can  easily  be  estimated  using  forward  sampling.  It  re¬ 
mains  to  specify  how  to  estimate  p(s  ^  s'). 

5.2  Regression 

Our  algorithm  for  estimating  p(s  s')  works  by  “regress¬ 
ing"  the  target  value  <5  through  the  expression  of  state  s.  In¬ 
tuitively,  the  regression  process  takes  an  expression  e  and  a 
target  value  6,  samples  values  for  some  subexpressions  of  e, 
and  computes  what  target  value  S'  for  the  remaining  subex¬ 
pression  ef  would  allow  e  to  produce  S.  We  say  that  the  com¬ 
putation  regresses  to  el .  The  choice  of  which  subexpressions 
are  sampled  and  which  are  regressed  to  is  sometimes  random¬ 
ized. 

Due  to  the  restrictions  on  the  language,  we  only  need  to 
handle  some  of  the  kinds  of  expression.  Emissions  do  not 
need  to  be  handled  because  delaying  subexpressions  may  not 
produce  emissions.  Flip  expressions  also  do  not  need  to  be 
handled  because  they  produce  a  Boolean,  not  a  floating  point 
number,  and  similarly  lambda  expressions  produce  functions, 
not  floating  points.  For  delay  expressions,  meanwhile,  the 
density  is  0,  because  s  cannot  possibly  lead  to  s',  which  has 
the  same  timestamp  as  s. 

Constants,  variables,  and  now  expressions  are  not  continu¬ 
ously  distributed,  and  we  require  that  delay  times  be  contin¬ 
uously  distributed  at  the  time  of  evaluation.  Therefore  such 
expressions  cannot  appear  at  the  top  level  of  a  delaying  subex¬ 
pression.  However,  they  may  appear  as  subexpressions  of  a 
delaying  subexpression,  as  long  as  they  are  combined  with 
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other  subexpressions  in  a  way  that  allows  the  whole  delay¬ 
ing  subexpression  to  be  continuously  distributed.  If,  in  the 
process  of  regression,  we  encountered  a  constant,  variable,  or 
now  expression,  we  know  that  the  randomization  over  which 
subexpressions  to  regress  to  has  led  to  this  non-continuously 
distributed  one.  If  this  happens,  we  give  up  and  try  again.  In 
some  other  randomization,  the  constant,  variable  or  now  ex¬ 
pression  will  be  sampled  rather  than  regressed  to,  and  we  will 
be  able  to  succeed. 

The  base  case  of  the  regression  process  is  a  continuous 
probabilistic  choice  q  (e) .  For  such  a  choice,  note  that 

p(s  s')  =  p( q(e)  resolves  to  S') 

=  Y^vP(e  resolves  to  v)(q(v)(5')) 

We  can  estimate  the  density  by  sampling  a  resolution  for  e. 
If  it  results  in  a  nested  delay,  we  can  immediately  return  an 
estimate  of  0,  because  s  cannot  lead  to  s'.  If  it  results  in  a 
value  v,  we  compute  q(v)(S').  We  can  do  this  because  q  is  a 
probabilistic  primitive  for  which  we  can  calculate  densities. 

Conditionals  can  be  handled  easily.  To  compute  the  density 
that  if  ei  then  e-i  else  e 3  produces  6,  we  first  sample  e\. 
If  the  result  is  T,  we  return  the  density  that  e 2  produces  8, 
otherwise  we  return  the  density  that  e$  produces  8. 

Now  consider  binary  operators.  Let  e  be  e\  ®  e-i. 

p(s  s')  =  p(e  1  ®  e-2  resolves  to  5') 

J2VlP(e  1  resolves  to  Ui) 
p(e 2  resolves  to  9(v\,  5')) 

where  9(vi,8')  is  the  value  such  that  v\  ®  9(vi,8')  = 
S'.  (We  assume  that  binary  arithmetic  operators  are  in¬ 
vertible.)  We  can  estimate  the  density  by  sampling  a  res¬ 
olution  e\  of  V\,  and  then  recursively  computing  the  den¬ 
sity  that  e2  resolves  to  9{v\,8').  Now,  this  works  fine 
for  an  expression  such  as  1 . 0  +  uniform  (2.0),  but  it 
does  not  for  uniform  (2.0)  +  1.0,  because  after  we 
have  sampled  uniform  (2 . 0 ) ,  we  will  regress  to  1.0, 
which  is  not  continuously  distributed,  and  give  up.  So 
we  randomly  choose  which  operand  to  sample,  and  of 
which  to  compute  the  density.  For  an  expression  such  as 
uniform  (2.0)  +  exp  (1.0),  we  succeed  either  way. 
On  the  other  hand,  an  expression  such  as  2 . 0  +  1 . 0  is  im¬ 
possible,  because  it  is  not  continuously  distributed. 

To  handle  pairs  and  component  extraction,  the  regression 
algorithm  is  given  an  additional  argument,  which  is  a  tar¬ 
get.  The  target  is  either  *,  which  means  that  we  require  the 
entire  result  of  the  expression,  or  a  sequence  of  lefts  and 
rights.  For  example  the  target  left,  right  means  that 
we  need  the  right  component  of  the  left  component  of  the 
value  produced  by  the  expression.  When  we  want  to  compute 
p(left  e  produces  8)  with  target  r,  we  recursively  compute 
p(e  produces  8)  with  target  left .  r. 

For  pairs,  note  that  if  a  program  is  well-typed,  the  value 
of  a  delaying  subexpression  must  always  be  a  floating  point 
number,  so  if  such  a  subexpression  involves  a  pair  expres¬ 
sion,  we  must  only  need  the  value  of  the  target.  This  is  not  to 
say  that  the  other  component  does  not  need  to  be  evaluated. 
If  the  other  component  involves  a  nested  delay,  it  will  delay 
the  whole  pair.  Therefore  we  proceed  as  follows:  We  sample 
the  other  component  to  see  if  it  produces  a  nested  delay.  If 


it  does,  s  cannot  lead  to  s',  so  our  estimate  for  the  density  is 
0.  Otherwise,  we  compute  the  density  that  the  needed  com¬ 
ponent  produces  S'. 

For  first  expressions,  we  note  that 

p(first  [ei,  e2]  resolves  to  5')  = 
p(e  1  resolves  to  5')+ 

p(e  1  results  in  a  nested  delay)p(e2  resolves  to  8') 

We  therefore  recursively  compute  the  density  that  e\  pro¬ 
duces  8' .  We  also  sample  e±  to  see  if  it  results  in  a  nested 
delay,  and  if  it  does  we  add  the  density  that  e2  produces  S'. 

Unfortunately,  we  run  into  trouble  with  some  function  ap¬ 
plications.  Our  basic  strategy  for  function  applications  is 
to  sample  the  function  and  the  actual  argument,  and  then 
compute  the  density  that  the  body,  with  the  formal  argument 
bound  to  the  actual  argument,  resolves  to  8' .  This  works  fine 
in  many  cases,  but  not  when  the  body  of  the  function  involves 
no  continuous  probabilistic  choices.  An  example  of  an  ex¬ 
pression  that  causes  problems  is 

(lambda  x  .  if  x  >  3.0  then  x  else  2  *  x) 
(uniform (5.0) ) 

Our  algorithm  is  therefore  incomplete.  It  works  for  a  large 
and  very  useful  class  of  programs,  including  all  the  exam¬ 
ples  of  Section  3.  We  could  have  placed  a  restriction  on  the 
language  to  rule  out  function  applications  in  delaying  subex¬ 
pressions.  However,  we  think  that  would  have  been  too  dra¬ 
conian,  ruling  out  some  genuinely  useful  models  on  which 
our  algorithm  works  fine. 

We  now  argue  that  if  the  particles  at  time  L-i  are  con¬ 
sistent  with  the  emissions  at  time  L ,  with  positive  probability 
our  algorithm  produces  a  sample  with  positive  weight,  as  long 
as  the  model  is  one  for  which  our  density  computation  works. 
Let  7T  be  a  path  beginning  with  a  particle  at  £,:_i  that  produces 
the  emissions.  Let  t.  be  the  time  point  at  the  beginning  of  the 
temporal  transition  that  terminated  at  L.  Consider  the  delay¬ 
ing  subexpression  at  t.  that  resolved  to  L  —  t.  We  can  reorder 
the  resolution  of  delaying  subexpressions  such  that  this  delay 
was  the  last  to  be  resolved.  Let  s  =  ( e ,  t)  be  the  state  before 
this  subexpression  is  resolved.  There  is  an  interval  around  all 
the  continuous  probabilistic  choices  in  tt  before  s  is  reached, 
such  that  if  a  sampler  takes  all  the  same  discrete  choices,  and 
all  the  continuous  choices  in  these  intervals,  and  resolves  de¬ 
laying  subexpressions  in  the  same  order,  it  will  reach  a  state 
s'  =  (e,  £'),  where  t'  is  in  an  interval  around  t,  and  the  den¬ 
sity  that  e  will  resolve  to  L  —  t'  is  positive.  Since  we  random¬ 
ize  the  order  in  which  delaying  subexpressions  are  resolved, 
the  sampler  makes  all  these  choices  with  positive  probabil¬ 
ity.  Since  the  density  is  positive,  with  positive  probability  all 
samples  taken  during  the  course  of  the  regression  will  turn  out 
the  way  they  need  to  for  the  density  estimate  to  be  positive, 
so  the  importance  weight  will  be  positive. 

5.3  Implementation 

We  have  implemented  our  algorithm  and  tested  it  on  a  variety 
of  examples,  including  some  for  which  we  can  determine  the 
correct  probabilities  analytically,  to  demonstrate  its  correct- 
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ness.  For  example,  given  the  model 
let  f  (b)  = 

delay  uniform  (if  b  then  2.0  else  3.0)  ; 

emit  '  ' foo' ' ; 

delay  uniform  1.0; 

emit  dist  [  0.8  :  b,  0.2  :  !b  ]  ; 

delay  exp  1.0; 

b 

in  f  (flip  0.4) 

and  the  evidence  [1.0  :  “foo”,  1.5  :  T],  the  system  generates 
10, 000  particles  in  2.5  seconds  on  a  2GHz  single-core  Win¬ 
dows  machine.  It  predicts  that  the  probability  the  output  will 
be  T  is  0.8053;  the  correct  probability  is  0.8.  We  have  also 
run  it  on  the  music  example  of  Section  3.  With  4  voices  and 
20  notes  per  voice,  the  system  filters  the  observations  using 
100  particles  in  34  seconds. 

6  Discussion 

We  have  presented  a  powerful,  expressive  language  for  repre¬ 
senting  probabilistic  processes  in  continuous  time.  We  have 
presented  a  semantics  for  the  language,  and  an  inference  al¬ 
gorithm  that  works  for  a  large  and  useful  class  of  models. 

A  natural  question  to  ask,  however,  is  whether  this  lan¬ 
guage  is  needed  at  all.  After  all,  probabilistic  programming 
languages  can  define  models  over  arbitrary  data  structures. 
Perhaps  we  could  write  a  program  in  IBAL  or  Church  that 
would  define  a  probability  distribution  over  trajectories  and 
emission  sequences,  which  would  consist  of  values  of  vari¬ 
ables  and  timestamps.  We  could  then  run  the  general-purpose 
inference  algorithm  of  those  languages  on  our  program. 

While  such  an  approach  might  be  possible  some  time  in  the 
future,  there  are  several  reasons  why  it  is  a  good  idea  to  de¬ 
velop  a  special-purpose  continuous  time  language.  One  basic 
reason  is  that  temporal  models  are  so  important  that  they  de¬ 
serve  a  language  of  their  own.  It  would  be  much  more  cum¬ 
bersome  to  develop  them  in  a  more  general  language  from 
scratch.  Ideally,  one  might  hope  to  write  a  library  in  the  more 
general  language  that  would  make  developing  temporal  mod¬ 
els  straightforward,  but  that  has  not  been  done  yet.  In  fact, 
CTPPL  may  serve  as  a  guideline  for  designing  such  a  library. 

Another  reason  to  develop  a  special-purpose  temporal  lan¬ 
guage  is  that  time  is  special.  The  task  of  continually  monitor¬ 
ing  the  state  of  a  process  in  an  online  manner  is  different  from 
that  of  reasoning  about  an  entire  model,  therefore  it  requires 
different  algorithms.  In  particular,  we  use  particle  filtering 
whereas  Church  uses  MCMC. 

A  third,  fundamental,  reason  is  that  IBAL  and  Church  do 
not  allow  continuous  variables,  and  it  is  non-trivial  to  extend 
them  to  do  so.  Note  that  CTPPL  does  not  solve  the  prob¬ 
lem  of  continuous  variables  in  a  general  way.  Floating  point 
valued  emissions  are  not  allowed.  The  only  place  in  which 
CTPPL  fully  handles  continuous  variables  is  in  the  time  ele¬ 
ment.  CTPPL  is  able  to  do  this  because  of  the  special  struc¬ 
ture  of  time.  Time  always  moves  linearly  forward,  and  delays 
are  always  positive.  Our  algorithm  relies  crucially  on  this 
structure.  Nevertheless,  we  believe  our  regression  algorithm 
to  contain  the  seeds  that  will  allow  a  language  like  Church  to 


fully  incorporate  continuous  variables.  Thus,  even  though  it 
develops  a  special-purpose  language,  the  ideas  in  this  paper 
may  be  useful  in  general. 
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